Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Model log compression #532

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

blimlim
Copy link
Contributor

@blimlim blimlim commented Oct 16, 2024

This draft PR addresses #527.

It adds a compress_log_files method, initially just to the cice.py driver. This is called during the archive step and compresses all the log files matching some specified regex patterns into a tarball. A compress_logs option in the config.yaml file controls whether to run the compression.

I've marked this PR as a draft, as I'd be keen to see if the implementation looks reasonable to others, and if so I think it would be good to move the changes over to the general model class making it available to each model.

@pep8speaks
Copy link

pep8speaks commented Oct 16, 2024

Hello @blimlim! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2024-11-18 01:13:45 UTC

@coveralls
Copy link

coveralls commented Oct 16, 2024

Coverage Status

coverage: 58.624% (+0.3%) from 58.361%
when pulling d4e3334 on ACCESS-NRI:527-compress-model-logs
into 7c885b8 on payu-org:master.

@aidanheerdegen
Copy link
Collaborator

I'd be tempted to add compress_log_files as a sub-option to archive. Something like

archive:
    enable: True
    compress_logs: True

and make compress_logs default to True.

Currently archive is a boolean, but we can support it being a dict as well. c.f. the collate option, which is converted to a dict from a boolean configuration option:

payu/payu/fsops.py

Lines 108 to 112 in 7c885b8

collate_config = config.pop('collate', {})
# Transform legacy collate config options
if type(collate_config) is bool:
collate_config = {'enable': collate_config}

It would also require changing the code to check if archive is enabled to something like this

payu/payu/experiment.py

Lines 827 to 828 in 7c885b8

collate_config = self.config.get('collate', {})
collating = collate_config.get('enable', True)

@blimlim
Copy link
Contributor Author

blimlim commented Oct 17, 2024

I'd be tempted to add compress_log_files as a sub-option to archive

I've had a go at implementing this in 0b2fcc9. See comments for further details!

@blimlim
Copy link
Contributor Author

blimlim commented Oct 17, 2024

Currently cice4 in ESM1.5 is the main model causing problems, but there are a couple of options for generalising this, making it available to other models:

  1. Just move the three methods cice.compression_enabled(), cice.get_log_files(), cice.compress_log_files() from the cice.py driver to the model.py driver, and add self.logs_to_compress = [] to the general model.__init__. Compression can be added to other models by adding the required regex patterns to self.logs_to_compress = [] in the relevant model driver.

  2. Leave the methods in cice.py, add methods with the same name to model.py that just pass. Compression can then be implemented in other models by overriding these methods in the relevant driver.

I think the methods should work generally enough that option 1 would work, and I think this would be a bit simpler/easier to follow. Keen to hear other ideas or opinions though!

@blimlim blimlim marked this pull request as ready for review November 4, 2024 22:25
@blimlim blimlim marked this pull request as draft November 4, 2024 22:26
@blimlim blimlim marked this pull request as ready for review November 4, 2024 23:09
@blimlim
Copy link
Contributor Author

blimlim commented Nov 4, 2024

@aidanheerdegen I've just marked this one as ready to review! I think the main outstanding question is whether we are happy with the log compression being implemented in the cice.py driver to start off with, or if we prefer to have it more general.

Copy link
Collaborator

@aidanheerdegen aidanheerdegen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's fine to keep it in the cice driver to begin with as this is the only model for which it is desperately required. Then we can test it operationally and tweak it if necessary.

I have suggested a couple of changes, but push back if you think they're unnecessry.

payu/models/cice.py Show resolved Hide resolved
payu/models/cice.py Outdated Show resolved Hide resolved
test/models/test_cice.py Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants